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Abstract. In this work we show that the convergence rate of Orthomin(fc) appHed to systems 
of the form (/ + pU)x = 6, where [/ is a unitary operator and < p < 1, is less than or equal to p. 
Moreover, we give examples of operators U and p > for which the asymptotic convergence rate of 
Orthomin(fc) is exactly p, thus showing that the estimate is sharp. While the systems under scrutiny 
may not be of great interest in themselves, their existence shows that, in general, Orthomin(fc) 
does not converge faster than Orthomin(l). Furthermore, we give examples of systems for which 
Orthomin(fc) has the same asymptotic convergence rate as Orthomin{2) for A; > 2, but smaller than 
that of Orthomin(l). The latter systems are related to the numerical solution of certain partial 
differential equations. 

Key words. Orthomin(fc), iterative methods 

AMS subject classifications. 65F10 

1. Introduction. Developed by Vinsome [B], Orthoinin(fc) (fc = 1, 2, 3, . . . ) is a 
family of iterative methods for solving linear systems of the form 

Ax = h, (1.1) 

where A S Mcl{C) is a nonsingular, possibly non-symmetric matrix, and b G C*. If 
Xn is the n*^ iterate of the method and r„ = 6 — Axn is the n}^ residual, the idea 
behind Orthomin(fc) is to compute the next iterate of the form Xn+i — Xn + x with 
the correction x residing in a fc-dimensional subspace V^^"^ (or (n + l)-dimensional for 
(n + 1) < k) and the Euclidean norm of the next residual r„+i — r„ — Ax minimized: 

||r„+i|| = min \\rn - Ax\\ . (1.2) 

The definition (|1.2[) is equivalent to 

r„+i = r„ - n^^^(„)r„ , (1.3) 

where Ily is the orthogonal projection on a subspace V. The algorithm generates a 
sequence of vectors po,pi,p2, ■ ■ . , called the search directions, and for n > k — 1 

(n) 

the space is generated by the last k search directions pn,Pn-i, ■ ■ ■ ,Pn-k+i', 

for n < fc — 1 the space F^^"'' is simply span{p„,p„_i, . . . ,po}. To give a precise formu- 
lation, for an initial guess xq we initialize the residual and the initial search direction 
by po = ''0 = ^ ^ Axq, and the Orthomin(A;) iteration reads 

X„+i =X„+ XnPn , rn+1 = r„ A„Ap„ , (1.4) 
miii(fc— l,n+l) 

Pn+1 = rn+1 - ^n^Pn-j+l , (1-5) 
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with 



{r„,Apn) 



'^^ = Ta A — V' 71 A — r, J = l,...,min(fc~l,n + l , 1.6 

d _ ii_ • ^rf .__jii_i|dof 



where {u,v) ~ X]j=i denotes the mner product in C*, and = y/ {u, u). The 
coefRcients A„ and in (II. 6p are defined so that 



r„+i ± Ap„ and Apn+i -L Ap„_j+i , j = 1, . . . , min(fc - 1, n + 1) . (1.7) 

A simple inductive argument shows that r„+i _L Apn-j+i for j — \, . . . , min(fc, n + 1), 
and hence (II. 2p holds. 

Orthomin(fc) is attractive for a number of reasons. First, as with other itera- 
tive methods, Orthomin(fc) can be implemented so that each iteration requires only 
one matrix-vector (mat-vec) at an additional cost of 0{kd) Flops; a maximum num- 
ber of k vectors need to be stored. This is different from GMRES where both the 
computational cost per iteration and the storage requirements increase with the iter- 
ation number. Finally, when symmetric positive preconditioners are used to produce 
a split preconditioning of Orthomin(/c), the preconditioned iteration can be imple- 
mented without reference to the factors of the preconditioners. This is a feature 
shared with CG, as shown by Elman ''3j. 

In terms of convergence properties, Orthomin(fc) is guaranteed to converge if the 
field of valucf0 of the matrix A does not contain the origin. The precise convergence 
result and estimate shown below appears in [?] as Theorem 2.2.2, and was proved first 
by Eisenstat, Elman, and Schultz in ^ (see elso Elman [3]) for matrices with positive 
definite symmetric part. We recall the result in ^ as 

Theorem 1.1. Assume that ^ ^{A) and 6 = dist(0, J"(A)). // r„ is the n"" 
residual in the Orthomin{k) iteration, then 

I I2 

(1.8) 

V ii^ir 

where \\A\\ is the 2-norm of the matrix A. 

We also recall from [?] the parallelism between Orthomin(l) and steepest descent 
one one hand, and between Orthomin(2) and the conjugate gradient method (CG) 
on the other. Steepest descent can only be used in connection to symmetric positive 
definite (SPD) systems and has an iteration of the form (|1.4p with the search direction 
given by p„ = r„, just like Orthomin(l). However, for steepest descent the coefficient 
A„ is chosen so that 

Cn+l — Cn — ^spa.n{r„}^rL , 

where Ily is the projection on the subspace V with respect to the A-inner product 
{u,v)j^ = {Au,v). Consequently, the error estimates for steepest descent are similar 
to the ones for Orthomin(l), and in practice the two methods converge comparably 
fast for SPD systems. Analogously, the sequence of search directions po,pi, . . . for 
CG follows a recursion that is similar to Orthomin(2), except for in CG we have 




spaii{p„,p„ + i}^ 



^The field of values or numerical range of a complex matrix A is defined as the set of complex 
numbers J"( A) = {(Am, u) : m G = 1} . 
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In addition, in the case of CG the second set of orthogonahty relations in (|1.7p is 
replaced by the A-orthogonality relation Pn+i -^a Pn (conjugacy), whereas for Or- 
thomin(2) they read Apn+i -L Apn- Even though the superiority of CG over steepest 
descent is well established and understood, not the same can be said about the re- 
lation of Orthomin(2) with Orthoniin(l) for nonsymmetric systems. We quote from 
Anne Greenbaum's text [4| (p. 34): "Unfortunately, no stronger a priori bounds on the 
residual norm are known for Orthomin(2) applied to a general matrix whose field of 
values does not contain the origin although, in practice, it may perform significantly 
better than Orthomin(l)." 

The main contribution of this article is to show that Orthomin(fc) does not 
perform better in general (that is, for matrices A that satisfy ^ J'iA)) than 
Orthomin(l). In Section[2]we consider matrices of the form A = I+pU with < p < 1 
and U unitary, and we conjecture that, under certain conditions, the asymptotic 
convergence rate of Orthomin(fc) for such systems is p; we support our conjecture 
with numerical evidence for k > 2 and we provide analytical arguments for fc = 1 in 
SectionlH which forms the core of this article. Prior to the analysis of the convergence 
rate of Orthomin(l), in Section|3]we give examples of systems for which Orthomin(j), 
= 2, . . . , fc all achieve the same asymptotic convergence rate, but converge faster than 
Orthomin(l). 

2. The main examples. Consider the linear system 

{I + pU)x = b, (2.1) 

where < p < 1, U £ Md{C) is a unitary matrix, and b G C*. Our goal is to assess 
the behavior of the ratios 

\\rn'\\ 

where ri*^'' is the n}^ residual in the Orthomin(fc) iteration. 

2.1. An upper bound. The fact that qn is bounded above by p is a consequence 
of the following result. 

Theorem 2.1. Let A e Md{C) be a normal matrix so that 



aiA) C Bpizo) (2.3) 

with < p < jzol- The residuals r^'^ obtained by applying the Orthomin{k) iteration 
to the system (jl.ip satisfy 

l^ol 



Proof. Let U = p'^iA - zqI). Since a{A) C Bp(zo) we have (t{U) C Si(0). 
Because A is normal it follows that U is also normal, hence ||C/||2 < 1- If PO:Pij ■ • ■ 
are the search directions of Orthomin(fc) we have 

n+l ~ n -^-^span{Ap„,...,Ap„_j} ' n 

where j = min(n, fc — 1). Hence 

ir-Si II < Iki'') - v\\ , V« e span{Ap„, . . . , Ap^-,} . 
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From the construction of the search direction we have ri'^'' G span{p„, . . . ,p„„j}, so 

Ari^'> e span{Ap„, . . . , Ap,^^^} . 

Therefore 

\\ri%\\ < ||rW - Zo'ArO^^l = ^||C/r«|| < ^|kW|| . □ 

Fo| \za\ 

We should note that for normal operators, the field of values being equal to the convex 
hull of the spectrum [5], condition p.3p is equivalent to 



TiA) C Bpizo) . 
Hence, the general result ()1.8p implies 



IkSrII < Jl - Iril^l = (2.5) 

The bound (12. 4p . valid for normal operators only, is significantly sharper than p.Sp . 



2.2. Sharpness of the upper bound. To show that the estimate p.4p is sharp 
we consider the diagonal matrices 

[/ = diag[l,Cd,d,-.-,Cr'] , (2.6) 

where — exp(27ri/(i) is the primitive root of unity of order d. Without loss of 
generality we take zq — 1. 

Conjecture 2.2. For all fc e there exists dfc e N and < pk < I so that for 
all p 6 (0, pfc) and d > dk, the residuals r^'n^ obtained by applying the Orthomin{k) 
iteration to the system (j2.1l) with U of the form (j2.6p and initial value xo = satisfy 

hm = p . (2.7) 

In this article we prove Conjecture 12.21 for k — 1 (see Theorem 14.141 in Section H75)) . 
For fc > 2, the numerical evidence in support of Conjecture 12.21 is quite strong, as 
shown in Section [2731 A consequence of Conjecture 12.21 is that for a given fc £ N we 
can find linear systems for which all of Orthomin(j), j = 1, . . . , fc, achieve the same 
convergence rate. This shows that, in general, Orthomin(fc) does not converge faster 
than Orthomin(l). 

Naturally, for any system in C*, Orthomin((i) will converge in at most d steps. 
In order to find linear systems so that for all fc G N, Orthomin(fc) achieves the same 
asymptotic convergence rate as Orthomin(l), we consider the infinite dimensional 
generalization of (12. 6p . Let dpQ{z) be the Haar probability measure on the unit 
circle S^, and consider the operator 

U:L^iS\dpo)^L^iS\dpo), Uf{z)^zf{z), (2.8) 

and the linear system (|2.1I) with right-hand side b G L'^{§^,dpo) given by 

b{z) = 1, Vz e §1 . 



CONVERGENCE RATE ESTIMATES FOR ORTHOMIN(K) 



5 



Conjecture 2.3. For all < p < 1 and k e N the residuals r^f^ obtained by 
applying the Orthomin{k) iteration to the system (j2.1l) with U of the form (j2.8l) . b as 
above and zero initial guess satisfy 



lim 



(fc) I 

n+l I 



.C^) II 



(2.9) 



In Section we prove Conjecture 12.31 for k ^ 1. 

2.3. Numerical evidence. In our attempt to verify Conjecture 12.21 we con- 
ducted numerical experiments with Orthomin(fc) for the system (|2.1I) with U as 
in (|2.6I) . In Table 2.1 we show the residual norms as well as the ratios q„ 

for d = 13, p = 0.8, and k = 1,2,3,4,5,7,9,11. For k = 5,7,9,11 we only show 
iterates 5-14, as the first 4 iterates are identical to those of Orthomin(4). Note that, 
in general, the first k steps of Orthomin(fc + 1) are identical to those of Orthomin(A;). 
The numbers in Table [^31 show that q„ 0.8 for k — 1, 2, 3, 4, 5, which is confirmed 
by further examining the difference |r„ — 0.8 1. However, for k — 7,9,11 we notice 
that Qn exhibits an oscillatory behavior and does not appear to converge to p = 0.8. 
This is confirmed by Figure \TT\ Further evidence shows that even for k = 7, 9, 11 the 
ratio Qn converges to p if we increase d. This is confirmed in Figure 12.21 We should 
point out that we also experimented with random values for tq , b, as well as small ran- 
dom perturbations pk of the numbers (with \pk\ = 1): we always observe qn ^ p 
as long as d is sufficiently large. 



-Orthomin(7) 
Orthomin(9) 
-Orthomin(11) 




20 25 30 

iteration number 



Fig. 2.1. logio |(j„ - p\ for Orthomin(k), k = 7,9, 11 (p = 0.8, d = 13). 



2.4. Connection with numerical PDEs. Systems of the form (|2.ip arise nat- 
urally in the numerical solution of partial differential equations. Consider the steady- 
state advection-reaction-difFusion equation on [0, 27r] 

- au"{x) + bu{x) + cu{x) = f{x) , a>0, c>0, 6gM, (2.10) 

with periodic boundary conditions w(0) = u(27r), m'(0) — u'{2tt). To obtain a dis- 
cretization of (I2.10|) we proceed as follows: set Xj = jh, j — 0,1, . . .,d, h = li^jd, 
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Table 2.1 

Convergence rates for Orthomin(k) for d = 13, p = 4/5, A; = 1, 2, 3, 4, 5, 7, 9, 11. 



it 


Orthomin(l) 


Orthomin(2) 


Orthomin(3) 


Orthoniin(4) 




\K\\ 


qn 




qn 


WnW 


qn 


WnW 


qn 


1 


3.6056 


0.6247 


3.6056 


0.6247 


3.6056 


0.6247 


3.6056 


0.6247 


2 


2.2524 


0.7533 


2.2524 


0.7156 


2.2524 


0.7156 


2.2524 


0.7156 


3 


1.6967 


0.7832 


1.6118 


0.7768 


1.6118 


0.7533 


1.6118 


0.7533 


4 


1.3289 


0.7935 


1.2520 


0.7919 


1.2141 


0.7873 


1.2141 


0.7725 


5 


1.0544 


0.7974 


0.9915 


0.7958 


0.9559 


0.7957 


0.9379 


0.7927 


6 


0.8407 


0.7989 


0.7891 


0.7985 


0.7606 


0.7984 


0.7435 


0.7975 


7 


0.6717 


0.7996 


0.6301 


0.7993 


0.6073 


0.7990 


0.5929 


0.7991 


8 


0.5371 


0.7998 


0.5036 


0.7997 


0.4852 


0.7996 


0.4738 


0.7996 


9 


0.4296 


0.7999 


0.4028 


0.7999 


0.3880 


0.7999 


0.3789 


0.7997 


10 


0.3436 




0.3222 




0.3103 




0.3030 




it 


Orthoniin(5) 


Orthomin(7) 


Orthomin(9) 


Orthomin(ll) 




\K\\ 


q-n 


\\rn\\ 


qn 


\\rn\\ 


qn 


VnW 


qn 


5 


0.9379 


0.7832 


0.9379 


0.7832 


0.9379 


0.7832 


0.9379 


0.7832 


6 


0.7346 


0.7956 


0.7346 


0.7896 


0.7346 


0.7896 


0.7346 


0.7896 


7 


0.5844 


0.7985 


0.5800 


0.7935 


0.5800 


0.7935 


0.5800 


0.7935 


8 


0.4667 


0.7995 


0.4602 


0.7983 


0.4602 


0.7959 


0.4602 


0.7959 


9 


0.3731 


0.7998 


0.3674 


0.7995 


0.3663 


0.7974 


0.3663 


0.7974 


10 


0.2984 


0.7999 


0.2937 


0.7998 


0.2920 


0.7993 


0.2920 


0.7983 


11 


0.2387 


0.7999 


0.2349 


0.7999 


0.2334 


0.7998 


0.2331 


0.7989 


12 


0.1909 


0.8000 


0.1879 


0.8000 


0.1867 


0.7999 


0.1863 


0.7997 


13 


0.1528 


0.7999 


0.1503 


0.7944 


0.1493 


0.7882 


0.1490 


0.7634 


14 


0.1222 




0.1194 




0.1177 




0.1137 






iteration number 

Fig. 2.2. login - Pl M Orthomin{k) , k = 7,9, 11 (p = 0.8, d = 32). 



be a uniform grid (we identify xq with Xd, x^i with Xd~i, and xi with Xd+i), and 
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replace the derivatives in (I2.10p with the usual centered difference formulas 

_ , 2u(a;j) - ujxj^i) - u{xj+i) , _ u{xj+i) -u{xj^i) 

u [X,) ^ ^2 ' " ~ 2h 

The resulting discretizatior^ is a linear system of type (jl.ll) : A is a normal matrix 
with orthogonal eigenvectors x^'^-' £ and corresponding eigenvalues given by 

(k) -7 \ ^ 2a /, , N • b ■ , , , \ 2a 

X ■ = exp(ifcj/i) , Afc = ~—cos[kh) + i- sm(fc/i) + c + — . 

The eigenvalues lie on an ellipse with semi-axes 2a/ h"^ and b/h; when 2a = hb this 
is a circle of radius b/h. After further rescaling, the system can be brought to the 
form ()2.ip . However, as will be shown in Section [3l this example is very relevant to 
the convergence study of Orthomin(A:) also when 2a hb. 

3. Further examples: normal matrices with spectra on ellipses. So far 

we have examined the systems (j2.ip . and we conjectured that for any k we can find 
operators U of the form (|2.6I) so that for all 1 < j < k, Orthomin(j) achieves an asymp- 
totic convergence rate equal to p. After a trivial rescaling, we restate Conjecture 12.21 
in the following way: for any circle C of center zq and radius p satisfying < p < |zo| 
there exists a normal matrix A whose spectrum lies on C so that for all 1 < j < fc, 
the Orthomin(j) iteration applied to the system with fe=(l,l,...,l)-^ and zero 
initial guess has an asymptotic convergence rate oi p/\zo\. 

In this section we show numerical evidence suggesting that if we replace the cir- 
cle C with a non-circular ellipse £ in the example above, all Orthomin(j) with fc > 2 
achieve the same asymptotic convergence rate p£ , which is smaller than the asymp- 
totic convergence rate of Orthomin(l). For the exact formulation see Conjecture 13. II 
We remark that the discretized numerical PDF from Section 12.41 is an example of 
precisely such a system. 

In order to make the examples very specific we first describe an ellipse £ by its 
semi-axes a > and /? > 0, the angle G K between its axes and the coordinate axes, 
and the position u e C of its center: 

f = {M-|-e'^(Q;cos7-|-i/3sin7) : 7e[0,27r]} . (3.1) 

It is assumed that neither £ nor its interior contain the origin. For d <E N we consider 
the numbers pj S £ defined as 

Pj=u + e^^lacos— — hi/3sin-^j , j = l,...,d. (3.2) 

As before, we associate a linear operator 

=^diag[^i,...,/Xd] . 

To construct an analogous continuous-space operator let dpg^z) denote the arc-length 
measure on £. Define 

As : L^i£, dps) L^i£, dps), A^fiz) - z/(z) . (3.3) 

Conjecture 3.1. For any ellipse £ there exists a number ps e (0, 1) so that the 
following hold: 



^This particular discretization is not appropriate for advection-dominated problems. 
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(i) For all k G N with k > 2, there exists £ N so that for d > the ratio 
In = ll/lkri^'' II of the residual-norm obtained by applying the Orthomin{k) 

iteration with zero initial guess to the system 

A£,d2:= (3.4) 

satisfies 

lim Qn = P£ ■ (3.5) 

n— ^oo 

(a) For any k > 2 the same ratio obtained by applying the Orthomin[k) iteration 
with zero initial guess to the continuous system 

Agx = 1 

also satisfies p.5p . 

{Hi) If the ellipse is not circular, then pg is smaller than the asymptotic conver- 
gence rate of Orthomin[l) . 

Two facts are notable about the behavior of Orthomin(fc) on the systems in 
Conjecture 13.11 First, it is remarkable that the ratios g„ converge at all; indeed, we 
show that for fc = 1 the sequence {qn]n£n is convergent regardless of the choice of 
the numbers /ii, . . . but for fc > 2 the sequence {(7„}„gN may not be monotone, 
and is not expected to converge in general. The second interesting fact is that all 
Orthomin(fc) with k > 2 achieve the same asymptotic convergence rate for sufficiently 
large d. Moreover, numerical experiments show that q„ converges to the same limit p£ 
even for a random initial guess and right-hand side b. However, in spite of the fact 
that p£ seems to be intimately related to the ellipse, currently we do not understand 
the nature of this connection, i.e., how to compute ps using only information about £. 

We conclude this section by showing numerical evidence in support of Conjec- 
ture 13.11 For numerical experiments we have selected an ellipse in general position 
(not aligned with the coordinate axes) with a = 2, (3 = 1, u = 2 -\- i, and 9 = 7r/6 
(see Figure [5TT|) . For d — 128 we solved the system p.4p using Orthomin(fc) with 
fc = l,2,3,4,10. In Figure [3?2] we plot the ratios Qn for each of the solves. The data 
strongly suggests that for k — 2, 3, 4, 10 we have 

qn P£~ 0.6891227 . 

This approximate value (up to the first eight digits) was also obtained when solv- 
ing (j3.4p with random right-hand side and initial guess. In the particular case of 
Orthomin(l), we know that g„ is convergent (and monotone increasing): numerically 
we find that limg„ w 0.7902. 

4. Convergence analysis for Orthomin(l). The main objective of this sec- 
tion is to prove Conjectures 12.21 and 12.31 for fc = 1. In Section [4.11 we show that 
the sequence {(7n}neN is increasing and bounded. After stating in Section 14.21 a few 
technical results, we discuss in Section examples when qn does not converge to p. 
The behavior of g„ for two-dimensional systems is presented in Section In Sec- 
tion |33] we prove Conjecture 12.21 for fc = 1. Sections 14.61 and W7f\ are devoted to the 
infinite-dimensional case (Conjecture 12. 3p . 

We consider matrices of the form 



A = diag[/xi, ...,pd] , (4.1) 



CONVERGENCE RATE ESTIMATES FOR ORTHOMIN(K) 





Fig. 3.1. The ellipse with semi-axes a = 2, /3 = 1, m = 2 + i, and 8 = tt/6 (d = 128j. 




-Orthomin(1) 

e-Orthomin(2) 

-Orthomin(3) 
'-Orthomin(4) 
-Orthomin(IO) 



-» « 



iteration number 

Fig. 3.2. The comparative residual norms for Orthomin(k) (k = 1, 2, 3, 4, 5, lOj.' for 
Orthomin(l) qn exceeds 0.7902, but for k = 2,3, 4, 5, 10 we note a convergence of qn to a value 
near 0.6891227. 



with fj,i, . . . , Hd G C nonzero complex numbers. Since we are interested in the evo- 
lution of the residuals, we retain only the recursive equation from Orthomin(l) that 
produces the residual r„ = rn ^: 

Tn+i = r„ - TlAr^r^ , (4.2) 
with rg e being chosen arbitrarily. Recall from (|1.4p and (|1.6p that 



An — 



1 Tn+l — ^n-^fn 



Let Tn = be the coefficients of r„. We consider the finite probability 

measure supported at 1, . . . ,d with weights proportional to jr^P, . . . , l^^fj^- We will 
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refer to it as the r„-measure, and use the subscript n to denote it. For instance, the 
expected value of a vector C — i^i^ ■ ■ ■ ^ ^d) with respect to this measure is 

Z^fe=l I'nl 

Since r„+i = r„ — A„Ar„ has coefficients = (1 — A„/Life)r^, the foUowing change 
of variable formula holds: 

_ E„(eii-A„Mp) 

E„(|l - A„^|^) 

where /i = (/ii, . . . , /x^) is the vector of eigenvalues of A. In particular, 

4.1. Monotonocity of g„. We first show that the sequence (/„ is increasing and 
bounded. 

Proposition 4.1. //r„ is given by (|4.2I) bkc? A is defined as in (|4.1I) . i/ien (7„ is 
increasing and bounded between and 1. 

Proof. We will use the measure-theoretic notation: 

9n = = En(|l - A„Mr) = E„(l + |A„| VI' - A„M - A„/2) 

= 1 + \Xn\'E„i\^l\') - a„e„(m) - a„e„(a2) ^ ' 



We compare 1 — qfi^i and 1 — qf^. For the latter, we use the change of variable 
formula ((43)) : 

2 _ |E„+i(Ai)|2 _ \En{fl\l-Xnfl\^)\^ 



^n+liW) E„ (|1 - A„Ai|2)E„ (|Ai|2|l - A„mP) 

We can re-write this as 



2 _ |E„(C|1-^P)P 



1 - Qn+l = 



E„(|i-enE„(iepii-ep) 

with ^ = A„/i. By construction, 

E„(e) = E„(ien = {Mf)i!, 
End^r) 

hence we can apply the result of Lemma [4.21 to ^: 



1 _ 2 _ ^ ,|.|2^ ^ |En(e|l-eP)P 2 

^-E„(|l-^P)E„(lCP|l-e|2^ 



hence g„ < g„+i. □ 

Lemma 4.2. Let ^ a complex-valued random variable with finite moments up to 
order 4 satisfying the identity E(^) = E(|^p). The following inequality then holds: 

E(iep)E(|i-ep)E(ieni-eP) > |E(eii-enp . (4.5) 
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Proof. First of all, we remark that if ^ satisfies the condition stated in the Lemma, 
then so does 1 — ^. Thus, the situation is symmetric in ^ and 1 — ^. 

Let 6* e R such that E(^|l - ^p) = e*^|E(^|l - Consider the function 

/:M^R, f{t):^YaT{t{l-0 + e''a^~0) , (4.6) 

where Var(^) — E(|^|^) — |E(C)I^ denotes the variance of a random variable ^. By 
opening up the parenthesis inside the expected value, we obtain 

fit) - t^E{\i - en - iE(i - on + 2t|E(eii - eni + mi'i-^ - 
= t^E{\^f)E(\i - en + 2t|E(eii - eni + E(ieni - er)- 

The second equality follows fom a manipulation of the coefficient of t"^ which takes into 
account the fact that E(e) = Ed^P). This shows that f{t) is a real valued quadratic 
form. The fact that it is a positive definite quadratic form follows from the fact that 
the variance of a random variable is always a positive number. Therefore, f{t) has 
negative discriminant: 

-E(iep)E(|i -ep)E(iep|i -en < . □ 

We should point out that in the case when ^ is real valued (which is not the 
case we are dealing with) , the statement of Lemma 14.21 can be reduced to Pearson's 
inequality [S] (see also [^j) between the skewness r and the kurtosis k of a distribution: 
K — — 1 > 0. We do not give a proof of this fact, as it is of no relevance to the 
rest of the paper. Note that we can think of g„ as measuring the dispersion of the 
random variable ^ relative to the r„ measure: variance about the mean divided by 
average size. The monotonicity of g„ reflects the fact that /i becomes increasingly 
more uniformly distributed relative to the r„ -measures. 

It is important to remark that Proposition l4. II holds for all normal (non-singular) 
matrices since they can be diagonalized using unitary transformations which leave the 
residual norms of Orthomin(fc), and hence g„, unchanged. 

4.2. The case /i^ = 1 + pC,ki \Ck \ = 1: and ro £ arbitrary. In this section 
we assume that A is of the form A ~ I + pU, U = diag[Ci, . . . ,(d], with < p < 1 
and |Ci| — ■ ■ ■ = \(d\ = 1- Also, we keep ro G arbitrary unless otherwise specified. 
We introduce the following quantities, for n > 0: 



{Ur„,rn) 1 - A„ 



(4.7) 



Note that the coefficients of rn+i are related to those of r„ as follows 

r^+i - (1 - Xn^ik)r':, = pA„(T„ - a)r^ (4.8) 
and the change of variable formula becomes 

E.+i(0 = |^^^!^^^- (4.9) 

En(|r„ - ci^) 



Lemma 4.3. For n > we have 



1 + + ZpMuJn P^n + 1 1 + p"^ + 2pJR(jJn 
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where Mz denotes the real part of a complex number z . 

Proof. Let ( — {(i,. . . ,(d)- Clearly, ojn = ^n{0- Since fj, = 1 + p(, we have 

e„(i + pC) 
" e„(|i + pCP)- 

The formula for A„ then follows from the fact that E„(l + pC,) = 1 + puin, and 
E„(|l + pCI^) — 1 + + 2pMujn. Next, the formula of r„ is a direct consequence of 
the formula of A„. Finally, 

2_ |E„(/^)p _ l + p^|cj„p + 2pjRc^„ _ p2(l-Kp) 

l + p^ + 2plRujn l + p^ + 2pRuJn' 

Proposition 4.4. For n > we have |a;„| < 1 and < g„ < p. Moreover, the 
following statements are equivalent: 

(a) lim Qn = p ■ (b) lim a;„ = —p . (c) lim A„ = 1 . (d) lim T„ = . (4.11) 

n— foo n— >-oo n— >oo n— fCJO 



Proof. The bound |a;„| < 1 follows from \\U\\ < 1. The fact that g„ is increasing 
has been proved in the previous section, and the bound qn < p is a direct consequence 
of Theorem 12.11 Since A„, T„, qn are continuous functions of a;„, the statement (b) 

^_ I |2 

clearly implies all the others. We also have (a) (b) since i^p'ij^pjpij^ < li with 
equality for cj = —p. Similarly (d) => (b) since r„ has bounded denominator. Finally 

p{p + UJn) 



An — 



1 + p^ + 2plRujn 



Since the denominator is bounded, lim„ A„ = 1 implies lim„ w„ — —p ((c)^ (b)). □ 
In addition to the quantities defined above, we also define the higher moments 

COn,r-=^^^^-^n{C), J>0. (4.12) 

These moments are used in the convergence proof in Section 14.51 Clearly, u!n,o = 1 
and ujn^i = oJn- Using the change of variable formula 

l(C^) = 



, ^ E„(c^|r„~cP) ^ E„{c^(i + |r„p-r„c-TnC)} 
^n+ns ) E„(|T„-C|2) E„{i + |T„|2-r„c-T„c} 



and we obtain the following 

Proposition 4.5. For n > we have the following recurrence relation 

(1 + \Tn\'^)^n,j — TnUJn,j-l ~ TnLOnj^i r a i n\ 

1 + \Tnr - 2iK(T„w„) 

Corollary 4.6. The collection of moments of the initial distribution {t^o,j}j>o 
completely determine the sequence UJn o,nd implicitly qn. 

Note that when Cfc are the roots of unity of order d we have uJn,j+d — ^n,j', hence 
the finite set of initial moments {i-^o.j}i<j<d completely determine the sequence qn- 
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4.3. Non-convergence to p. Let Hull(Ci, • ■ • , Cd) denote the convex hull of 
^1, . . . , Cd- This is a compact convex subset of C. Since 



the sequence w„ cannot converge to —p unless — p G Hull(Ci, ■ • ■ , Cd)- Since the state- 
ments lim„ LOn = —p and lim„ Qn — p are equivalent, we have the following. 

Proposition 4.7. Assume -p ^ Hull(Ci, . . . , Cd)- Then lim„ q„ ^ p. 

Corollary 4.8. Assume that p e (0, 1) is arbitrary, and \9k\ < vr — arccos(p), 
for 1 < k < d. If Ck = sxp{i0k), then lini(7„ ^ p. 

Proof. The angles are chosen so that K(Cfe) > rho. This ensures —p ^ Hull(Ci , . . . Xd), 
and the previous Proposition applies. □ 

Figures |4T] and |4?2] illustrate the context of Corollary 14.81 Qn does not converge 
to p, and LUn does not converge to —p. We end this section with a sharpened version 
of Conjecture 12.21 for k — 1: 

Conjecture 4.9. For Orthomin{l), if —p e Hull(Ci, . . . , Cd), then g„ — > p. 



10 20 30 40 



90 100 



|co +p| 




20 25 30 

iteration number 



Fig. 4.1. The case when —p does not belong to Hull(fi, . . . , f^); p = 0.9, d = 15. 



4.4. The case d=2. Surprisingly, this case is not completely trivial either. 

Proposition 4.10. Assume d = 2 and the initial vector tq G is arbitrary, 
with non-zero entries. Then (j„ is a constant depending on tq, while aj„ is a periodic 
sequence with period 2. The convergence (|2.7|) for k = I does not hold in this case. 

Proof. With a rotation, we may assume Ci = 1 and C2 = C is arbitrary. Then 
pi — I + p and p2 — ^ + pC- We have 



Ao = 



{ro,Aro) 
{Aro,Aro) 



Mikol' + A2|rgP 
ll^^olP 



Wo 



<\rl\ 



rl\2 



therefore 



1-A, 



-p(l-C)M2 |rg 



212 



1 - X0P2 
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; X 




Fig. 4.2. The case when —p does not belong to Hull((^i, . . . , ^^j); p = 0.9, d = 15. 

On the other hand ri = — XgArQ, hence = (1 — Ao/ii)rQ, and = [1 — Ao/i2)fo- 
Therefore 



1 - Ao^2 
1 - Ao/xi 



M2 ^2 



= rif^ ■ (4.14) 



By applying the same procedure to ri instead of rg, we obtain 



.1|2 



C|r: 



2 12 
2 I 



.1|2 



2|2 

ol 



r.l|2 



412 



I.e. UJ2 



This shows that the sequence a;„ is periodic with period 2. With the above formulae 
for 1 — A/ii and 1 — A/i2, we also have 



llnf _ |l-AMiPKP + |l-Ai/i2nrgP 



„1 12 



r2|2 



\raV\\Aro\\ 



(4.15) 



Let y = koP/koP- The above fraction equals, up to a constant, 



1 + 1/y l + y|/i2|VlMi|^ 



9{y) 



Because of (|4.14p . substituting ri for tq amounts to substituting y by j^^^^. This 

does not change the value of g{y)^ which means that 
that (72 = 91- Similarly, qn = qn-i for n > 2. □ 



I|r2||" _ lllii 



4.5. Convergence of g„ to p. We have already seen that lim„ q. 
only if lim„ A„ = 1. In this section we will work with the quantities 



This proves 
I = p if and 



/3„:=1-A„, w„:=a;„,i, and v„ := tj„,2, 
and we formulate sufficient conditions that guarantee /3„ 0. We have 

Tn+i = Tn - A„(/ + pC/)r„ ^ /3„Ar„ - pUr^ , (4-16) 
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and 

(r„+i,r„+i) = (r„+i,r„ - A„Ar„) = (r„+i,r„) . (4-17) 
Further, since U is unitary, 

{l-p)<\\A\\<{l + p) . (4.18) 

Now, 

, _ (r„+i, Ar„+i) _ (L/r„+i, Ar„+i) 

_ (?7r„+i, r„+i) + p(r„+i, r„+i) _ (£/r„+i, r„+i) + p(r„+i, r„) 
^ P^n+iP ll^r„+i||2 

(L/r„+i,r„+i) + (t/r„+i,p[/r„) (J7r„+i,r„+i +pC/r„) 



^ ^ 



2 



BTTet {Urn+i, (1 - A„)Ar„) _ t ^ (?7r„+i, Ar„) 

Therefore 

((7r„+i,Ar„) - ([/(/?„(/ + pC/)r„ - pt/r„), (/ + p[/)r„) 



/3„+l = 



/?„(([/ + pf/2)r„, (/ + p[/)r„) - (pC/2r„, (/ + pt/)r„) 



P/3«(;5„((l + p^)Un + p{l + Vn)) - P^Un - pV^)-. 



i|Ar„+iP ■ 
Next, the statement 

||r„+i|| = ||/?„Ar„ -pC/r„|| > p||t/r„|| - ||/3„Ar„|l > ||r„||(p - |/3„|(1 + p)) 
imphes 

||r„|| _ ||r„|| ||r„+i|| ^ 1 



Pr„+i|| |lr„+i|| ||Ar„+i|l " (p - |/3„|(1 + p))(l - p) 
Therefore 

|« I a I (l/?»|((l + + p(l + \Vn\)) + p'\Un\ + Pbn 

- (p-|/3„|(l + p))^(l-p)^ 

Next we need to estimate We have 

) _ (t/(/3„Ar„ - p;7r„),^„^r„ - pC/r„) 



(^'n+lj'-n+l) Pn+lP 

|/3„P(AC/r„, Ar„) - p(/3„(Ar„, r„) + /3„([/2r„, Ar„)) + p2(;7r„, r„) 



P„+iP 

hence 

i«„+ii = {m^\\Ar+2pm . pn (4.20) 



|/3„|^(l + p)^ + 2p|/3„|(l + p)+ 

(p-|/3„|(l + p))2 • ^ ■ ^ 
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The analogous inequality can be derived for Vn- We summarize the previous inequal- 
ities in 

Proposition 4.11. The following recurrence relations hold: 

|o 1^ |o I Mil + p^)\u„\ + p{l + + p^\Un\ + p\Vn\ 
\Pn+l\ < P\l3n\ 1^ , .^^2(^ „ ' 



\Vn+l 



< 



[p-|/3„|(l + p)F(l~p)2 
+ p)2 + 2p|/3„|(l + p ) + 
(p-|/3„|(l+p))2 
|/3„|2(1 + p)2 + 2p|/3„|(l + p) + 



\Un+l\< ^ I,-, , , (4.22) 



(p-|/3„|(l+p))2 



We will also need the following inequality which we state without proof. 

Lemma 4.12. For \x\ < 0.1, -jj^^ < 1 + Cx, with C = 2.5. 

Proposition 4.13. Assume the following: < p < 0.1, and cjo.i — wo,2 = 
■^0,3 = 0- Then, for n > 1, we have: 

(z) <p + 2.7EL2P' <p + 3p' ; 

bn|<2.7EL2P'<3p' ; 

(til) |/3„| < 

Proof. We use the recurrence relations ()4.13p to compute the first few terms in 
the sequences /?„ , u„ , w„ . 

_ UJQ.I + p _ pTg _ 

^0 — ~ ~ — P 1 Po — — — — — — 2 ' 



pwo.i + 1 1 + pTo 1 + p2 

(1 + |roP)i^o,i - Jqwo.o - Tbwo,2 



l + |To|2-2iR(rot^o,i) 1 + 



_ _ (1 + |T'oP)a;o,2 - rpi^o.i - Tqujo^s _ 



OJl,l + 1 3 o P^l P 



4 



ri- \ , =p', fii = 



pwia + l " ' " l + pTi l + p4 

The inequalities in the proposition are thus true for n = 1, and we proceed by induc- 
tion. We assume that the statements (i-iii) are true for some n > 1, and we prove 
that they hold for n -I- 1 as well. For that, we rely on the inequalities of Proposition 
14.111 We start with the inequality (iii): 

n+3 p"+M(l + p')(l + 3p) + l + 3p2]+p(l + 3p) + 3p 

^ [l-p"+l(l+/5F(l-p)2 

< n+3 + + 3p) + 1 + 3p2] + p(l + 3p) + 3p 

-P [l-p2(l+p)]2(l-p)2 

The fraction on the right hand side has numerator equal to 4p-|-5p2 -f-3p^ +4p'* + 3p^. 
This is easily seen to be less than 0.5, as < p < 0.1. On the other hand, the 
denominator is certainly greater than 0.9^ y. [1 ~ TTjn)^ > 0.7. Therefore the fraction 
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on right hand side is less than 1, and < /o"^^. 

For inequahty (ii), 

p2(„+2)(i^p)2^2p"+3(l+p)+p2|^„| 
\Un+l\ < 



[p-p"+2(l+p)]2 

^ p2(»+l)(l + p)2+2p"+l(l+p)+|Mn| 
[l-p»+l(l + p)]2 

= 7^ ^ , withx = p"+^ 1 + p , 

(1 — x)^ 

< (1 + Ca;)(x2 + 22; + |u„|) , with C 2.5, 
= |u„| + x[2 + C\un\ + (2C + l)x + Cx^] . 

From the induction step, |u„| < p + Sp^. Also, x = + p) < p^{l + p). The 

quantity inside the square brackets is less than 

2 + C{p + ip") + (2C + l)p'{l + p) + + pf . 

As < p < 1, this is easily seen to be less than 2.5. Therefore, 

\un+i\ < \un\ + 2.5(1 + < |u„| + 2.7p"+i . 

Hence < |wi| + '^■'^J2k=2 ■ "^^^ exact same method is applied to u„+i. □ 

Theorem 4.14. Assume the following hold: 
(a) < p < 0.1; 
(6) d > 4; 
(c) ro = [l,...,lF; 

(c?) Cfe are the roots of unity of order d; 
(e) A^I + pU,U^diag{[C,,...Xd])- 
Then the sequence rn+i ^ r^ — HArni^n satisfies 

n->oo ||r„|| 

Proof. The hypotheses ensure that cjo,i = 1^0,2 = '^o.a = 0. Proposition 14 . 1 3l then 
applies to show 

Km 

lim ~ ^ lim A„ — 1 lim g„ = p . D 

n 71 n 

4.6. The infinite dimensional case. In this section we prove Conjecture 12.31 
for k — 1. Recall that dpo is the Haar probability measure on the unit circle S^, and 
assume that < p < 1 is a fixed parameter (there will be no further restrictions on p 
in this section). For n > 0, we define 

^—5 — r^, := — — — , dp„+i(z) := \z - T^Ydpnyz) . (4.24) 

J^idpn(zj pw„ + 1 

As shown in Section |4?2| the number a;„ is equal to (J7r„,r„) / (r„,r„) where r„ is 
the residual obtained by applying Orthomin(l) to the system ()l.ip with J7 chosen as 
in (|2.8p . and initial residual rg = 1. Our goal is to show that lim„ r„ = which is 
equivalent to Conjecture 12.31 for k ^ 1. For this, we need the following Lemma. 
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Lemma 4.15. For q G C and n > 1, we define the polynomial in X : 

n n 

(X, g) =n(^ =E««^' ' 



fc=i 



where Oi — ai{q) are functions of q. The following identity holds: 



2n\ 



(4.25) 



(4.26) 



Proof. This identity can be proved using an identity of MacMahon (see [1] [7]) 
from the theory of partition functions. We give a few definitions to provide context. 
The q-Pochhammer symbol is defined as 



{x\q)n J|(l 



xq 



i=l 



When \q\ < 1, the product on the right hand side is convergent as n — >■ cx), and the 
hmit is denoted by {x;q)oo- The q-binomial coefficient is defined as 



m 
n 



g {q;q)k{q;q)n-k 

MacMahon's identity ([7], paragraph 323) states that 



(zq,g)™(z-i;g)„= ^ (_i)'^/(fe+i 



m + n 
n + k 



(4.27) 



fe=-n 

Using our notation from (|4.25p . we note that 

$„(x,q)$„(l/a;,(7) = {xq;q'^)n{x^^q;q'^)r 
We apply (|4.27p with q^ instead of q and z = xq^^, to obtain 

n r n 

$„(x,g)$„(l/a;,g)= ^ (-l)V'x'= ^ 

k— — n 

Identifying the constant coefficient and the coefiicient of x on both sides yields 

, E aiflj+i = -q 



(4.28) 



i=0 



2n 

n 



Therefore, 



En— 1 



En 



2n 
n + 1 

-g(i - g'") 

1 _ q2n+2 



i=0 



2n 

n 



2n 

n + 1 



(4.29) 



('?^;g^)n-i(g^;'?^)n+i ('?^;g^)2n 



Theorem 4.16. Let a;„, r„, /z„ 6e de/ined by (gSD). T/ien r„ = p^n+i^ > q. 
/n particular, lim„ T„ — 0, which proves Conjecture \2.3\ for k = \. 
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Proof. We have 

J dfio[zj ■ a + 1 

and the statement is true for n — 0. We proceed by induction. Let n > 1 and assume 
that Tk = p^^-'+i for fc less than n — 1. From the definition, 

n-l 

dMnW = n k-p''+'l'dMoW = |*„(z,p)prfAioW . (4.30) 

fe=0 

With Gi = ai{p), the formula for a;„ reads 

^ _ / z|^n(^:,p)pc;Mo(^) _ / ^1 Sr=o Qj^'Pc^^o 
/ |$„(z,p)|2d^o(2) / IEr=o"»^'P'^A'o 

Since J z^dp,Q{z) vanishes unless i = 0, we have 

We can now apply Lemma 14.151 (note that (p) are real valued) : 

-p(l~p^") ^« +P 2n+2 

which completes the induction step. □ 

Remark. The finite dimensional analogue of Theorem 14.161 is the case when dfiQ 
is the equal weighted discrete probability measure supported at the roots of unity of 
order d. The exact same argument works there as well, but only up to n = c? — 2. 
Beyond that, the right hand side of equation (|4.31l) is complicated by the higher 
correlation sums ^^0^0^+^^, with fc > 1. This is due to the fact that, in the finite 
dimensional case, the integral / z'^iio{z) = J2k=i 6xp( ^""'' ) can be nonzero beyond 
the trivial case n = 0: namely, when n = (modd). Our proof of Theorem l4. 141 fwhich 
allows for some flexibility in the initial conditions) avoided the issue of explicitly 
computing r„. 

4.7. Connection w^ith the Jacobi Triple Product Formula. Recall the 
Jacobi triple product formula ([1], Theorem 2.3) : 

00 00 

n(l-^p''-')(l-^-V''-')(l-p'') = E (-l)'?''^'' \P\ < 1 J^l = 1 • (4.32) 

k—1 k— — oo 

When |2;| = 1 and p E (0, 1), this identity can be re- written as 

oo 

l[\z- p^^-'\^ = [p'.p^rj Y.i~^)'p'^^' ■ (4-33) 
k=i fcez 

If we compare this to (|4.30p we see that the sequence dp-n has a strong limit dp^o, 
whose density (with respect to the Haar measure dpo) is exactly (j4.33l) . 
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Recall the quantities uin,k defined at (|4.12p : in terms of the measures they 
can be written as 

We can see now that lim„ uJn,k can be computed: 

hm a;„,. = ^^f f^^^^'^ ^ i^^if /' . (4.35) 

Conclusions. For fc e N we give examples of linear systems (both finite and 
infinite-dimensional) for which we conjectured that Orthomin(l), . . . , Orthomin(fc) 
achieve the same asymptotic convergence rate. These examples show that, in general, 
Orthomin(A:) does not converge faster than Orthomin(l). We analyze in detail the 
convergence of Orthomin(l) and provide numerical evidence in support of our conjec- 
tures with respect to Orthomin(fc) for A: > 1. The analysis for Orthomin(l) is fairly 
complicated and we do not see a straightforward way to extend the arguments to 
Orthomin(A;) for fc > 1. We provide numerical evidence that certain normal operators 
(related to numerical PDEs) with spectrum lying on an ellipse, have the following 
property: Orthomin(2), Orthomin(3), etc. all have the same asymptotic convergence 
rate (depending only on the ellipse); moreover this is smaller than the asymptotic 
convergence rate of Orthomin(l). This example offers a promising path to finding 
improved convergence rate estimates for Orthomin(2) under additional assumptions 
on the spectrum/field of values of the matrix. An important question, which remains 
unanswered, is whether there are applications where Orthomin(A:), perhaps coupled 
with preconditioners, can compete with the usual iterative solvers for non-symmetric 
systems. 
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